Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data
نویسندگان
چکیده
Benchmarking unsupervised outlier detection is difficult. Outliers are rare, and existing benchmark data contains outliers with various unknown characteristics. Fully synthetic usually consists of regular instances clear characteristics thus allows for a more meaningful evaluation methods in principle. Nonetheless, there have only been few attempts to include benchmarks detection. This might be due the imprecise notion or difficulty arrive at good coverage different domains data. In this work, we propose generic process generation datasets such benchmarking. The core idea reconstruct from real-world while generating so that they exhibit insightful We describe benchmarking detection, as sketched far. then three instantiations generate specific characteristics, like local outliers. To validate our process, perform state-of-the-art carry out experiments study quality reconstructed way. Next showcasing workflow, confirms usefulness proposed process. particular, yields close ones real Summing up, new practical
منابع مشابه
Outlier Detection with Uncertain Data
In recent years, many new techniques have been developed for mining and managing uncertain data. This is because of the new ways of collecting data which has resulted in enormous amounts of inconsistent or missing data. Such data is often remodeled in the form of uncertain data. In this paper, we will examine the problem of outlier detection with uncertain data sets. The outlier detection probl...
متن کاملMultivariate outlier detection with compositional data
Multivariate outlier detection is usually based on Mahalanobis distances, by plugging in robust estimates of location and covariance. For compositional data, carrying only relative information, a special transformation needs to be consulted in order to be able to work in the appropriate geometry. The effect of the transformation is discussed in this contribution. Furthermore, different possibil...
متن کاملOutlier detection for skewed data
Most outlier detection rules for multivariate data are based on the assumption of elliptical symmetry of the underlying distribution. We propose an outlier detection method which does not need the assumption of symmetry and does not rely on visual inspection. Our method is a generalization of the Stahel-Donoho outlyingness. The latter approach assigns to each observation a measure of outlyingne...
متن کاملOutlier Detection in Multivariate Data
The objective of this research is detection of outliers in multivariate data employing various distance measure, particularly using robust regression diagnosis technique. Several classical outlier identification methods are based on the sample mean and covariance matrix in general. But they do not always yield better result, as they themselves are affected by the outliers. Sometimes one outlier...
متن کاملOutlier detection in astronomical data
Astronomical data sets have experienced an unprecedented and continuing growth in the volume, quality, and complexity over the past few years, driven by the advances in telescope, detector, and computer technology. Like many other fields, astronomy has become a very data rich science. Information content measured in multiple Terabytes, and even larger, multi Petabyte data sets are on the horizo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Knowledge Discovery From Data
سال: 2021
ISSN: ['1556-472X', '1556-4681']
DOI: https://doi.org/10.1145/3441453